Document Image Classification Via AdaBoost and ECOC Strategies Based on SVM Learners
نویسندگان
چکیده
In this paper, we describe easily extractable features and an approach for document image retrieval and classification at spatial level. The approach is based on the content of the image and utilizing visual similarity, it provides high speed classification of noisy text document images without optical character recognition (OCR). Our method involves a bag-of-visual words (BoVW) model on the designed descriptors and a RandomWindow (RW) technique to capture the structural relationships of the spatial layout. Using the features based on these information, we analyze different multiclass classification methods as well as ensemble classifiers method with Support Vector Machine (SVM) as a base learner. The results demonstrate that the proposed method for obtaining structural relations is competitive for noisy document image categorization.
منابع مشابه
AdaBoost and Support Vector Machines for Unbalanced Data Sets
Boost is a kind of method for improving the accuracy of a given learning algorithm by combining multiple weak learners to “boost” into a strong learner. The gist of AdaBoost is based on the assumption that even though a weak learner cannot do good for all classifications, each of them is good at some subsets of the given data with certain bias, so that by assembling many weak learner together, ...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملEmpirical analysis of support vector machine ensemble classifiers
Ensemble classification – combining the results of a set of base learners – has received much attention in the machine learning community and has demonstrated promising capabilities in improving classification accuracy. Compared with neural network or decision tree ensembles, there is no comprehensive empirical research in support vector machine (SVM) ensembles. To fill this void, this paper an...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملTraffic sign classification using error correcting techniques
Traffic sign classification is a challenging problem in Computer Vision due to the high variability of sign appearance in uncontrolled environments. Lack of visibility, illumination changes, and partial occlusions are just a few problems. In this paper, we introduce a classification technique for traffic signs recognition by means of Error Correcting Output Codes. Recently, new proposals of cod...
متن کامل